Methodology Report Using Growing Self-OrganisingMaps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing
نویسندگان
چکیده
Metagenomic projects using whole-genome shotgun (WGS) sequencing produces many unassembled DNA sequences and small contigs. The step of clustering these sequences, based on biological and molecular features, is called binning. A reported strategy for binning that combines oligonucleotide frequency and self-organising maps (SOM) shows high potential. We improve this strategy by identifying suitable training features, implementing a better clustering algorithm, and defining quantitative measures for assessing results. We investigated the suitability of each of di-, tri-, tetra-, and pentanucleotide frequencies. The results show that dinucleotide frequency is not a sufficiently strong signature for binning 10 kb long DNA sequences, compared to the other three. Furthermore, we observed that increased order of oligonucleotide frequency may deteriorate the assignment result in some cases, which indicates the possible existence of optimal species-specific oligonucleotide frequency. We replaced SOM with growing self-organising map (GSOM) where comparable results are obtained while gaining 7%–15% speed improvement.
منابع مشابه
Using Growing Self-Organising Maps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing
Metagenomic projects using whole-genome shotgun (WGS) sequencing produces many unassembled DNA sequences and small contigs. The step of clustering these sequences, based on biological and molecular features, is called binning. A reported strategy for binning that combines oligonucleotide frequency and self-organising maps (SOM) shows high potential. We improve this strategy by identifying suita...
متن کاملBioinformatics strategies for taxonomy independent binning and visualization of sequences in shotgun metagenomics
One of main steps in a study of microbial communities is resolving their composition, diversity and function. In the past, these issues were mostly addressed by the use of amplicon sequencing of a target gene because of reasonable price and easier computational postprocessing of the bioinformatic data. With the advancement of sequencing techniques, the main focus shifted to the whole metagenome...
متن کاملClassification of metagenomic sequences: methods and challenges
Characterizing the taxonomic diversity of microbial communities is one of the primary objectives of metagenomic studies. Taxonomic analysis of microbial communities, a process referred to as binning, is challenging for the following reasons. Primarily, query sequences originating from the genomes of most microbes in an environmental sample lack taxonomically related sequences in existing refere...
متن کاملDraft Genome Sequences of Two Benthic Cyanobacteria, Oscillatoriales USR 001 and Nostoc sp. MBR 210, Isolated from Tropical Freshwater Lakes
Genomes of two filamentous benthic cyanobacteria were obtained from cocultures obtained from two freshwater lakes. The cultures were obtained by first growing cyanobacterial trichome on solid medium, followed by subculturing in freshwater media. Subsequent shotgun sequencing, de novo assembly, and genomic binning yielded almost complete genomes of Oscillatoriales USR 001 and Nostoc sp. MBR 210.
متن کاملAssembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data
UNLABELLED White spruce (Picea glauca) is a dominant conifer of the boreal forests of North America, and providing genomics resources for this commercially valuable tree will help improve forest management and conservation efforts. Sequencing and assembling the large and highly repetitive spruce genome though pushes the boundaries of the current technology. Here, we describe a whole-genome shot...
متن کامل